A Distributed Directory Cache Coherence Scheme and Its Eeects on Network Performance List of Figures Figure Page 1 System of 8 Processors Constructed Using 2x2 Switches in The
نویسندگان
چکیده
This paper proposes a new protocol for maintaining cache coherence in a large multiprocessor system. The study focuses on systems with private caching that use a multistage network to interconnect the processors. The protocol, called the Distributed Directory Scheme, distributes part of the directory information into the interconnection network switches. The redistribution of the directory information allows the delivery of coherency commands to be streamlined, thus eliminating a bottleneck inherent in the traditional Full Vector Scheme proposed by Censier and Feautrier. When evaluating the coherence protocol performance, the operations of both the cache coherence protocol and interconnection network are considered simultaneously. It is discovered that both the Distributed Directory Scheme and the Full Vector Scheme cause undesirable traac patterns (hot spots) inside the network. However, the hot spots that arise from the coherence traac in the Distributed Directory Scheme respond quite well to (simpliied) network combining. It is found that the Distributed Directory Scheme access latencies are as much as 34% improved over the Full Vector Scheme access latencies. 5 Diierence in read access latency between the Distributed Directory and Full Vector Scheme without combining for write probability of 0. 6 Non-uniform percentage of the Forward network traac created by the Acknowledgement messages sent by the processors to the memory 7 Average message delay in the Forward network for the Full Vector Scheme for write probabilities 0.2, 0.1, 0.05, and 0. 8 Diierence in message delay in the Forward network for the Full Vector Scheme (switch size = 4). 10 Read access latencies with combining for write probability = 0.1 and 0. 11 Percent decrease in read access latency in Full Vector and Distributed Directory Schemes due to the combing for write probability of 0. 12 Diierence in read access latency between the Distributed Directory and Full Vector Scheme with combining for write probability of 0.
منابع مشابه
Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring
Private caching and shared caching are the two conventional approaches to managing distributed L2 caches in current multicore processors. Unfortunately, neither shared caching, nor private caching guarantees optimal performance under different workloads, especially when many processor cores and cache slices are provided on a switched network. This paper takes a very different approach from the ...
متن کاملProcessor-Directed Cache Coherence Mechanism – A Performance Study
Cache coherent multiprocessor architecture is widely used in the recent multi-core systems, embedded systems and massively parallel processors. With the ever increasing performance gap between processor and memory, there is a requirement for an optimal cache coherence mechanism in a cache coherent multiprocessor. The conventional directory based cache coherence scheme used in large scale multip...
متن کاملA Comparative Evaluation of Hierarchical Network Architecture of the HP-Convex Exemplar
The Convex Exemplar (SPP1000 and SPP2000 series) is a new commercial distributed shared-memory architecture. Using a set of system kernels and two application programs, we examine performance eeects on network latency, hot spot contention, cache coherence and overall scaling capability, which result both from the choice of the network structure as well as from its CC-NUMA memory system feature....
متن کاملHierarchical Bit-Map Directory Schemes on the RDT Interconnection Network for a Massively Parallel Processor JUMP-1
JUMP-1 is currently under development by seven Japanese universities to establish techniques of an e cient distributed shared memory on a massively parallel processor. It provides a memory coherency control scheme called the hierarchical bit-map directory to achieve cost e ective and high performance management of the cache memory. Messages for maintaining cache coherency are transferred throug...
متن کاملA Versatile Directory Scheme(Dir2NB+L) and Its Implementation on BY91-1 Multiprocessors System
Cache coherence and synchronization between processors have been two critical issues in designing a shared memory multiprocessors system. From the perspective of hardware design, a directory based cache coherence protocol and lock mechanism are employed to prevent inconsistency of caches and warrant atomic memory accesses. The BY91-1 multiprocessors ejiciently integrate supports for cache coher...
متن کامل